AITopics | synthetic dataset

Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation

Neural Information Processing SystemsApr-30-2026, 07:08:31 GMT

Preparing training data for deep vision models is a labor-intensive task. To address this, generative models have emerged as an effective solution for generating synthetic data. While current generative models produce image-level category labels, we propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion (SD). By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation. These techniques enable us to generate segmentation maps corresponding to synthetic images. These maps serve as pseudo-labels for training semantic segmenters, eliminating the need for labor-intensive pixel-wise annotation. To account for the imperfections in our pseudo-labels, we incorporate uncertainty regions into the segmentation, allowing us to disregard loss from those regions. We conduct evaluations on two datasets, PASCALVOC and MSCOCO, and our approach significantly outperforms concurrent work.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (0.46)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ddbbcd937d63d5c6b935c07b1a8222ec-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 00:36:08 GMT

artificial intelligence, data quality, machine learning, (21 more...)

Neural Information Processing Systems

Genre:

Research Report (0.46)
Overview (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

An Efficient Dataset Condensation Plugin and Its Application to Continual Learning

Neural Information Processing SystemsApr-29-2026, 22:05:55 GMT

Dataset condensation (DC) distills a large real-world dataset into a small synthetic dataset, with the goal of training a network from scratch on the latter that performs similarly to the former. State-of-the-art (SOTA) DC methods have achieved satisfactory results through techniques such as accuracy, gradient, training trajectory, or distribution matching. However, these works all perform matching in the high-dimension pixel space, ignoring that natural images are usually locally connected and have lower intrinsic dimensions, resulting in low condensation efficiency. In this work, we propose a simple-yet-efficient dataset condensation plugin that matches the raw and synthetic datasets in a low-dimensional manifold.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

d1881b5125b4e9cf42f6d6d0b6575934-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 20:50:19 GMT

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

a10946e1f46e1ffc0daf37cb2abfdcad-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 05:39:10 GMT

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

b1656d20067ca7c84a33785c4083a75e-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-27-2026, 05:23:51 GMT

artificial intelligence, preferential value function, ref -shap, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

MIM4DD: Mutual Information Maximization for Dataset Distillation

Neural Information Processing SystemsApr-25-2026, 23:44:23 GMT

A.1 In-variance of Mutual Information Theorem 1 (In-variance of Mutual Information): Mutual information is invariant under reparametrization of the marginal variables.

artificial intelligence, dataset, machine learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

MIM4DD: Mutual Information Maximization for Dataset Distillation

Neural Information Processing SystemsApr-25-2026, 23:44:20 GMT

Dataset distillation (DD) aims to synthesize a small dataset whose test performance is comparable to a full dataset using the same model. State-of-the-art (SoTA) methods optimize synthetic datasets primarily by matching heuristic indicators extracted from two networks: one from real data and one from synthetic data (see Figure 1, Left), such as gradients and training trajectories. DD is essentially a compression problem that emphasizes maximizing the preservation of information contained in the data. We argue that well-defined metrics which measure the amount of shared information between variables in information theory are necessary for success measurement but are never considered by previous works. Thus, we introduce mutual information (MI) as the metric to quantify the shared information between the synthetic and the real datasets, and devise MIM4DD numerically maximizing the MI via a newly designed optimizable objective within a contrastive learning framework to update the synthetic dataset. Specifically, we designate the samples in different datasets that share the same labels as positive pairs and vice versa negative pairs. Then we respectively pull and push those samples in positive and negative pairs into contrastive space via minimizing NCE loss. As a result, the targeted MI can be transformed into a lower bound represented by feature maps of samples, which is numerically feasible. Experiment results show that MIM4DD can be implemented as an add-on module to existing SoTADD methods.

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

312f1ba2a72318edaaa995a67835fad5-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 09:02:37 GMT

artificial intelligence, dataset, machine learning, (19 more...)

Neural Information Processing Systems

Industry: Health & Medicine (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)

Add feedback

291d43c696d8c3704cdbe0a72ade5f6c-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 05:12:23 GMT

A.1 Broader impact Our work introduces a general method for unsupervised 3D segmentation that can be used for any 3D voxel-grid data. This line of work is especially useful for analyzing biomedical data, as many different types of biomedical data are in volumetric form and lack the ground truth annotations required for fully-or semi-supervised segmentation. For example, we may wish to study diseased tissue but do not have sufficient understanding to ensure that unexplored features of interests are labelled in training data. We illustrate the potential of our proposed approach for scientific discovery applications using our example of cryo-ET data in the Appendix. The discovered features can now be analyzed for their chemical identities and functions, in diseased vs. healthy cells.

artificial intelligence, dataset, machine learning, (19 more...)

Neural Information Processing Systems

Industry: